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Abstract 

Predicting biological structure has remained challenging for systems such as dis- 
ordered proteins that take on myriad conformations. Hybrid simulation/experiment 
strategies have been undermined by difficulties in evaluating errors from computa- 
tional model inaccuracies and data uncertainties. Building on recent proposals from 
maximum entropy theory and nonequilibrium thermodynamics, we address these issues 
through a Bayesian Energy Landscape Tilting (BELT) scheme for computing Bayesian 
"hyperensembles" over conformational ensembles. BELT uses Markov chain Monte 
Carlo to directly sample maximum-entropy conformational ensembles consistent with 
a set of input experimental observables. To test this framework, we apply BELT to 
model trialanine, starting from disagreeing simulations with the force fields ff96, ff99, 
ff99sbnmr-ildn, CHARMM27, and OPLS-AA. BELT incorporation of hmited chemical 
shift and measurements gives convergent values of the peptide's a, /?, and PPn 
conformational populations in all cases. As a test of predictive power, all five BELT 
hyperensembles recover set-aside measurements not used in the fitting and report accu- 
rate errors, even when starting from highly inaccurate simulations. BELT's principled 
framework thus enables practical predictions for complex biomolecular systems from 
discordant simulations and sparse data. 

Key words: Molecular Dynamics, NMR, Conformational Ensembles, Bayesian Statistics 
Running Title: Bayesian Energy Landscape Tilting 
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Introduction 

The past forty years have seen the experimental determination of "ground-state" structures 
for countless biological macromolecules (1). Modern biology, however, presents many systems 
that do not fit a single-structure paradigm. "Excited" conformational states of nucleic 
acids (2), natively disordered proteins (3), and protein folding intermediates (4) are all 
poorly described by single conformation models. For such systems, models of conformational 
ensembles are required to understand and to predict structural and equilibrium properties. 

A growing body of research has sought to characterize structural ensembles. Much of 
this work has focused on incorporating dynamical information during NMR structure deter- 
mination (5, 6) or the extraction of multiple conformers from X-ray diffraction data (7, 8). 
While these techniques are powerful, they share difficulties in data collection, the unified 
treatment of heterogeneous experimental data, and data sparseness relative to the number 
of degrees of freedom. In particular, conformational ensemble modeling requires the esti- 
mation of not just a single structure, but a collection of structures and their associated 
equilibrium populations. This highly under-determined problem involves the simultaneous 
estimation of approximately 3 x N x m parameters, where m is the number of states in the 
ensemble and N is the number of atoms in the molecule. Estimating uncertainties of these 
ensembles further amplifies this challenge. Inference in this regime necessarily requires more 
information, which in principle can be attained by combining measurements with simulations 
that leverage prior physical understanding encoded in atomistic force fields. 

Despite recent advances in force field development (9, 10), simulation benchmark stud- 
ies have demonstrated continuing inaccuracies in molecular dynamics (MD) force fields (11). 
Force field modifications based on direct fitting to NMR measurements have also been demon- 
strated (12, 13, 14), but such work has optimized only a small fraction of the required force 
field parameters. Thus, simulations are often unable to recapitulate ab initio the wide variety 
of measurements available on molecular systems. This inaccuracy poses a challenge when one 
desires atomic-scale models that are both consistent with presently available measurements 
and predictive of those yet to be measured. 

Here, we introduce a practical statistical approach to modeling solution ensembles of bi- 
ological macromolecules. The algorithm, Bayesian Energy Landscape Tilting (BELT), uses 
solution experiments to reweight an ensemble of atomistic models predicted (perhaps inac- 
curately) by molecular dynamics. BELT generalizes a recently proposed maximum entropy 
method (15) to the practical scenario in which the experimental measurements and their 
estimated relationships to atomic conformation carry error. In particular, BELT leverages 
Markov Chain Monte Carlo (16) to transform experimental ambiguity into error bars on 
arbitrary structural features. The final output of BELT modeling is a hyperensemble, or 
an "ensemble of ensembles", which we show is closely connected to a generalized ensemble 
theory proposed by Crooks (17). This hyperensemble is a collection of statistical samples, 
each of which is itself a conformational ensemble that corresponds to a maximum-entropy 
solution associated with a particular set of experimental observables. 

The necessity and utility of this approach can be illustrated with a simple example with 
one experimental observable. Most previous methods have focused on obtaining estimates of 
a single best-fit conformational ensemble (15, 18, 19). However, ambiguous experimental data 
often disallow such a point-estimate of the conformational ensemble. For example, we plot 
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one measured (19) value of ^ J{Hi^H°') in the context of the Karplus (20) equation relating 
(p to ^J{H]\[H°') (Fig. la). The measured coupling corresponds to four different values 
of (p, precluding description by a single point estimate of (j), much less a single estimate 
of the distribution of 0. Many different ensembles are consistent with the measurement 
(Figs. lb,c), leading to nearly completely loss of predictive power. A molecular dynamics 
simulation can establish a prior estimate for the ensemble ( Fig. Id), but may disagree with 
the observed data beyond measurement error (Fig. le). In this case, how to compute a 
statistical collection of ensembles that leverages both the simulation and the data has not 
been obvious; for example, prior Bayesian approaches return uncertainties assuming single 
conformations, not full ensembles (21). The BELT approach described herein (blue traces 
in Fig. ld,e) offers a practical recipe for describing such a hyperensemble, for computing 
the hyperensemble 's predictions for new experimental observables not used in the modeling, 
and for giving rigorous error estimates on these predictions. 

After laying out the theoretical framework for BELT, this study presents in-depth tests 
based on assessing the convergence of ensembles constructed from force fields with radically 
different properties. We investigated the conformational propensities of trialanine using 
NMR measurements (19) and MD simulations performed in five different force fields. The 
small size of this model system enabled assessment of BELT without complications from 
incomplete sampling. At the same time, trialanine populates multiple conformational states 
and allows incisive tests of ensemble modeling. Although the raw simulations show wide 
variations in their conformational preferences, BELT corrects force field errors to provide 
concordant estimates of the a, /3, and PPn populations. The ability to correct the biases of 
diverse forcefields provides a stringent test of the proposed calculation scheme for connecting 
simulation and equilibrium measurements. 

Theory: Bayesian Energy Landscape Tilting 
Model Inputs 

To model an ensemble using BELT requires three components. First, we need conformations 
Xj {j = 1, m) sampled from the equilibrium distribution of some physically realistic model. 
This model will serve as a prior on structural properties; in the absence of experimental data, 
the BELT model inherits the properties of the conformations Xj. In the present work, such 
conformations will be generated from molecular dynamics simulations. Second, we require 
equilibrium experimental measurements Fj {i = l,...,n) and their associated uncertainties 
(Tj [i = 1, ...,n). Third, it is necessary to have a direct connection between simulation and 
experiment. This connection is achieved by predicting each experimental observable at each 
conformation: fi{xj) is the predicted value of experiment i at conformation Xj. 

Reweighting 

The next step in constructing an ensemble is to calculate the population of each conformation. 
Inspired by a previous method for restraining simulations (15) (see Appx. SI), we reweight 
individual conformations by a biasing potential that is a linear combination of the predicted 
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Figure 1: (a). The Karplus equation connecting the backbone torsion 0 to ^J{H]^H'^). The 
measured value of ^J^H^H"") is shaded gray and is consistent with multiple values of 0. (b, 
c). Histograms of four chemically unrealistic ensembles that recapitulate the measured (gray) 
value of ^J{HnH°'). Each histogram is represented in both the backbone torsion (p (b) and 
the projection (via Karplus equation) onto ^ J{HmH°') (c). Dashed vertical bars represent 
the average ^ J{HnH°') for each corresponding ensemble, (d, e). The molecular dynamics 
(ff99) ensemble (red) is inconsistent with the measured ^J{H]sfH°'). Four samples (blue) from 
the BELT hyperensemble show good agreement with measured values of ^J{HnH°'). For 
this figure only, the uncertainty (a) on ^ J{HnH°') was increased 2.5-fold to better illustrate 
differences. Density spikes in (c, e) correspond to values where ^ — ^ 0. 
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observables: 



n 



i=l 



In AU{x]a), the parameters aj determine how strongly each experiment contributes to 
the biasing potential. As shown previously (15), such a linear biasing potential gives a 
maximum entropy ensemble for some set of experimental observations. The BELT strategy 
is to look beyond the single best such ensemble so as to estimate the uncertainty in the 
ensemble modeling. BELT instead samples over a distribution of such maximum entropy 
ensembles each parametrized by a,. This approach is connected (see Appx. SI) to recent 
work by Crooks that proposed an entropic prior for modeling hyperensembles in general 
physical problems. 

The end result is a collection of 'landscape-tilted' ensembles (Fig. le). That is, each 
conformational ensemble is a perturbed version of the initial molecular dynamics ensemble 
but reweighted (see Appx. S2) according to energetic perturbations that are linear in the 
experimental observables fi{x): 



With the equilibrium populations, we can calculate the equilibrium expectations of an 
arbitrary observable h{x): 



In the above bracket notation, {h{x))a is the ensemble average of h{x) in an ensemble 
that is perturbed by a biasing potential AU{x; a). At this point, the determination of the 
parameters a, has not yet been discussed. The key idea, however, is that the a reweighted 
ensemble ()„ should recapitulate the experimental measurements: 



Forcing this to be an exact equahty recovers previous results (15) that can be derived 
from maximum entropy considerations (Appx. SI); here, however, we take into account the 
experimental uncertainties associated with each Fi. 

Determining a 

A Bayesian framework enables determination of the coefficients a used in the biasing po- 
tential. An alternative derivation using the Crooks hyperensemble formalism (17) is given 
in Appx. SI. BELT assumes that, given the correct choice of a, the predicted observables 
fi{x) provide unbiased (but noisy) predictions of the measurements Fi. This recipe assumes 
independence (see Appx. S3) and the following conditional probabilities: 





a 



P{F,\a)r^N{{fiix))^,a^) 
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In the above equation, N{.,.) refers to a normal distribution with specified mean and 
variance. For the current work, we model o"j as the uncertainty associated with predicting 
chemical shifts and scalar couplings from structures; this error is quantified by the RMS 
uncertainty estimated during the parameterization of chemical shift and scalar coupling 
models. Using Bayes' Theorem, we can calculate the posterior distribution of a: 

P{a\F,, oc P{F^, F„|a)P(a) 

Now we let LP{a) denote the log posterior of a and simplify, dropping terms that are 
independent of a: 



LP{a) =\og[P{a\Fi,...,Fn)] = -J^^ (if ii^))c^ ' ^iY + ^ogP (a) + constant 

i * 

Note the simple form of the log posterior. The first term (i.e. the log likelihood) measures 
the agreement between the reweighted ensemble and measurements. The second term is 
the log of the prior distribution on a. 

In the present work, we evaluate three different choices of prior (Appx. S4), finding 
similar results for each. The first is the maximum entropy (maxent) prior, which penalizes 
ensembles as they deviate from the raw simulation results: 

logP(a) = -A5^7r,(a)log^ 

In the previous expression, tt^ refers to the populations of an unweighted ensemble, which 
are typically ^, while A is a hyperparameter that controls the strength of the prior. We also 
consider using a Dirichlet prior, which is functionally similar to the maxent prior (Appx. 
S4): 

The third prior we consider is a multivariate normal prior, where a ~ A/(0, S). The value 
of S is given by Sj^ = XCov{fi{x), fj{x)), as derived in Appx. S4. 

Each of these priors can be used to achieve regularization, which is a powerful technique 
to reduce overfitting (22). Large values of A favor the raw simulation results (i.e. uniform 
conformational populations): nj ^ n'j = ^. The value of A can be chosen via cross-validation 
or other methods (see Appx. S5). When using the maxent prior in the limit of large A and 
(T — )• 0, BELT recovers the hyperensemble picture of nonequilibrium statistical mechanics as 
developed (17) by Crooks (see Appx. SI). The Dirichlet and Normal priors do not share the 
same connection to the Crooks hyperensemble formalism; however, for normally distributed 
observables, all three priors will give identical results (23). 
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MCMC Sampling of Structural Ensembles 

As noted above, because ensemble inference often presents many plausible solutions (21, 24, 
25), we avoid statistical methods that return a single solution (e.g. maximum likelihood or 
maximum entropy). We therefore use Markov chain Monte Carlo (MCMC), as implemented 
in PyMC (16), to sample the distribution of structural ensembles — one ensemble per sam- 
pled a — consistent with experiment. The result is an ensemble of ensembles — a statistical 
ensemble of conformational ensembles. Averaging all MCMC samples provides posterior 
mean estimates of arbitrary structural features or experimental observables. Similarly, ex- 
amining the MCMC variances provides statistical uncertainties of equilibrium or structural 
features. A Bayesian bootstrapping procedure (26) can also be used to model the statistical 
uncertainty of the MD simulations (see Appx. S6). 

Methods 

Molecular Dynamics Simulations 

Trialanine was simulated in the ff96 (27), ff99 (28), ffgQsbnmr-ildn (29, 30), CHARMM27 
(31, 32), and OPLS-AA (33) force fields, as previously reported (11). Simulations were 
performed using Gromacs 4.5 (34) and run at constant temperature (300 K) and pressure 
(1.01 atm). Each simulation was at least 225 ns long and used the TIP4P-EW water model 
(35). Conformations were stored every 1 ps. 

Chemical Shifts and Scalar Couplings 

All NMR measurements in this work refer to experiments probing the central residue of 
trialanine (19). The experimental data were measured at pH 2, near the pKa of the car- 
boxylate moiety of the C terminus, which would requires a constant pH simulation, rather 
than a fixed protonation state. Because such simulations are challenging with current force 
fields and simulation packages, we simulated the trialanine construct with charged termini — 
conditions in which the the force fields have been best calibrated and tested. We therefore 
focus our analysis on the central alanine residue, which should be most robust to pH de- 
pendent effects. Both pH differences and force field inaccuracies will lead to systematic 
differences between simulation and experiment; indeed, we assess whether BELT robustly 
corrects these deviations. 

Chemical shifts (iJ, if", C", C^) for each frame were calculated using a weighted average 
of ShiftX2 (36), SPARTA+ (37), and PPM (38) predictions; uncertainties for each model were 
estimated using their reported RMS prediction errors. Overall uncertainties were estimated 
as Wiaf, where oc is the weight (^jWj = 1) of each chemical shift model and 
(Ti is the uncertainty of each chemical shift model. The J couplings were calculated using 
the following Karplus relations: ^J{H^C') (20), ^J{H^H'') (20), 2j(A/C") (19), ^J{mC') 
(39), ^J{NC^) (19), J{H^ C^) (20). J coupling uncertainties were approximated as the 
RMS errors reported when fitting the Karplus coefficients. 

We have divided the available experimental measurements into training and test sets, 
with the training set consisting of the ^ J{H^ C), '^J{NC°'), and J{H'^ C^) scalar couplings 
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and the C", , and chemical shifts. The test set consists of ^J{H^H'^), ^J{H'^C'), 
"^J(iVC°), and the H"" chemical shift. The division into training and test sets serves three 
purposes. First, it provides a test of overfitting. Second, it allows us to reduce the computa- 
tional cost of BELT calculations. Third, it allows us to train on data that are approximately 
uncorrelated; BELT is best suited for working with uncorrelated data. 

BELT 

All BELT calculations were performed using the FitEnsemble package (https : / / github . 
com/kyleabeauchamp/FitEnsemble). The online FitEnsemble tutorial demonstrates the use 
of BELT with a single experimental measurement {^J{H^H°')). Source code for calculations 
in this work is freely available at https://github.com/kyleabeauchamp/EnsemblePaper. 

The regularization strength A weights simulation versus experimental data. To determine 
this weighting in an unbiased manner, BELT carries out cross validation on the simulation 
data, as described in Appx. S5; this procedure also reduces errors due to finite sampling of 
equilibrium properties. For each model, we used PyMC to sample at least 5,000,000 values of 
a; sampled values of a were thinned 100-fold to reduce correlation. The first 5,000 samples 
(before thinning) were discarded as burn-in. Convergence of MCMC sampling was assessed 
by visual examination of MCMC traces; a well-sampled and thinned trace will appear to 
be white noise, without correlation between one sample and the next. MCMC traces are 
shown in Fig. S2 and discussed in Appx. S7. To incorporate simulation uncertainty, we used 
Bayesian Bootstrapping (Appx. S6). Two Bayesian bootstrap replicates were performed. 

Results 

Short peptides provide crucial tests for evaluating and optimizing molecular dynamics force 
fields (9, 11, 14, 19, 40). Such peptides offer a window into the intrinsic conformational 
propensities of amino acids, free from the secondary structure bias found in statistical surveys 
of protein structures (41). To test the proposed theoretical framework, we used BELT to 
infer the conformational populations of trialanine from chemical shift and scalar coupling 
measurements (19). 

Conformational Propensities of Trialanine Simulations 

Trialanine was simulated (see Methods) in five different force fields. The chosen force fields 
show considerable variation in their predicted conformational propensities. The ff96 force 
field shows a bias towards (3 conformations (population: 51%) (Fig. 2b, red). On the other 
hand, ff99 strongly favors helical conformations, with a predicted a population of 80% (Fig. 
2c, red). The PPn state, known to be the dominant state in solution from independent 
approaches (19, 40, 42), is the dominant simulated state only in the ff99sbnmr-ildn force 
field (Fig. 2a, red). Low PPn populations and inconsistency between force fields have been 
previously noted (9, 11, 14, 19). 
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Agreement with NMR Measurements: MD and BELT Ensembles 

Given the differences in conformational propensities, one miglit expect varying degrees of 
agreement witli tlie available experimental measurements. This is indeed the case; four out of 
five force fields show values of the reduced (nX^) greater than 1.0 (Fig. 3a, red). Because of 
this considerable error, we therefore examined BELT hyperensembles based on incorporating 
six NMR measurements of chemicals shifts (C", C^, and H) and scalar couplings (^J{H^C'), 
'^J{NC°'), and ^ J{H^ C^)) to reweight each of the five molecular dynamics ensembles. As 
expected, the BELT hyperensembles accurately recapitulate these six measurements used 
in the reweighting (Fig. 3a). In a more incisive test, the BELT hyperensembles accurately 
predicted four measurements (if" chemical shift and ^J{H^H°'), ^J{H°'C') and ^J{NC'^) 
scalar couplings) that were not used to fit the models. (Fig. 3b). A table of predicted and 
observed NMR measurements is given in Tables 1, SI, and S2. 

Converged Conformational Propensities Observed in BELT Ensem- 
bles 

Although the raw MD simulations predicted quite different conformational propensities, 
BELT reweighting gave five ensembles with conformational populations that agreed to within 
estimated statistical uncertainty (Fig. 2). Quantitative predictions and uncertainties are 
given in Tables S3-S6. In accord with expectation, the lower accuracy force fields (e.g., ff99) 
were assigned lower A values than force fields that were able to predict the experimental 
data a priori (see Supporting Information). The lower accuracy simulations also give final 
predictions that were more uncertain (e.g., PPn frequency of 69 ± 13% for ff99) than force 
fields that are able to predict experimental data a priori (e.g., PPu frequency of 71 ± 4% for 
ff99sbnmr-ildn). Nevertheless, the final predictions agreed, and residual modest differences 
provided practical estimates of systematic error. In general, we find {PPu, f3, a) populations 
of (67 ± 9%, 23± 6%, 10± 8%); here the mean and uncertainty are approximated as the 
mean and standard deviation across all force fields and priors. 

In addition to convergence between models constructed from different force fields, we also 
assessed the convergence between BELT models built using different priors on the parameters 
a. In general, different priors gave similar results with small quantitative differences (Figs. 
2 and 3). Building BELT models with different priors could therefore be further useful for 
bracketing uncertainties in situations with limited simulation data. 

The Resolution Limit of Trialanine BELT Ensembles 

Despite the near-quantitative agreement in a, /3, and PPu populations (Fig. 2) and overall 
Ramachandran features (Fig. 4), the fine details of the Ramachandran plots differed be- 
tween the five models. Because all five BELT ensembles showed excellent agreement with 
experiment (Fig. 3), we concluded that six chemical shifts and scalar couplings were insuf- 
ficiently informative to resolve (and falsify) subtle force field differences. The most obvious 
such difference was the width, shape, and orientation of the PPu basin. Most strikingly, ff96 
and OPLS-AA gave PPu basins that were vertically oriented in the Ramachandran plot, 
while ff99, ff99sbnmr-ildn, and CHARMM27 gave diagonally oriented PPu basins. Two 
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different effects contributed to this resolution limit: the information content in the experi- 
mental measurement and the uncertainty in predictors of experimental observables. Again, 
the BELT strategy of modeling with different starting MD simulations revealed the residual 
uncertainties from these systematic errors. 

Discussion 

Structural Ensemble Biology 

Why model structural ensembles, rather than just structures? At least three compelling 
reasons favor ensembles. First, biological molecules are multi-state machines that fold, un- 
fold, bind ligands, aggregate, and change conformation. Biology is controlled by the relative 
populations of these states. Ensembles capture aspects of these phenomena by encoding 
equilibrium populations with structures. A second argument for ensemble modeling is fi- 
delity to experiment. Most solution experiments measure ensemble average equilibrium 
properties: chemical shifts, scalar couplings, NOEs, SAXS, and FRET can often be approx- 
imated as equilibrium properties. A truly quantitative connection to these measurements 
requires modeling the equilibrium ensemble. Finally, recent advances in atomistic simulation 
(34, 43, 44, 45), special-purpose hardware (46), and distributed computing analysis (47, 48) 
have enabled atomistic simulations to reach the millisecond timescale (49, 50, 51, 52); the 
computational cost of ensemble modeling is quickly becoming manageable. 

One might argue that structural ensembles are unnecessary because many proteins occupy 
a single state under physiological conditions. For such proteins, it is probably safe to enforce 
single state behavior, as is assumed in current modeling approaches. However, we suggest 
that the number of states be inferred — not assumed. 

Comparison to Previous Ensemble Methods 

Previous ensemble modeling efforts that are most similar to BELT share three key ingredi- 
ents: state decomposition, a objective function, and population inference on the clusters. 
For example, this general recipe describes the approach used in previous analyses of ho- 
mopeptides (19), the EROS technique for SAXS modeling (18), and the Bayesian Weighting 
(BW) formalism (24, 53). Note that of these three techniques, only BW goes beyond re- 
turning a single best-fit ensemble and instead characterizes the posterior distribution via 
MCMC; below we therefore focus our attention on BW as it is most directly comparable to 
BELT in scope and purpose. 

The primary disadvantage of previous techniques is the need for a state decomposition, 
which must be defined either by hand or by clustering. Working with a given state decom- 
position can introduce two different errors, depending on the number and quality of states. 
In the limit of few states, clustering can overly coarsen the system of interest, preventing 
the model from reproducing multiple experimental observables. At the other extreme, too 
many states leads to a large number of parameters to be estimated. This will lead to poor 
generalization performance and large errors when predicting experiments not used to train 
the model, as well as reliance on a subjective choice of how many states is appropriate. 
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Figure 2: MD and BELT (maxent, Dirichlet, and MVN priors) conformational propensities 
(for central alanine residue) in each force field. 
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Reduced by force field (Training Set) Reduced by force field (Test Set) 




Figure 3: The reduced error (e.g. ^) for MD and BELT (maxent, Dirichlet, and MVN 
priors) models. The BELT reduced is estimated as the mean reduced over all MCMC 
samples, (a). Calculated using the six measurements used to fit the BELT model, (b). 
Calculated using four measurements not used to fit the BELT model. See Methods for 
the definition of training and test sets. Note that the training and test sets are not fully 
independent because all measurements probe the (0, ip) backbone torsions. 

One symptom of this regime is discontinuity in conformational populations. For example, 
imagine two nearby conformations at the boundary between two BW states — one confor- 
mation on each side of the boundary. In BW, the populations of each conformation could 
fluctuate dramatically with the corresponding state populations. In BELT, however, the 
two conformations will have nearly identical populations if the predicted observables vary 
smoothly. 

BELT avoids arbitrary state decompositions by projecting simulations onto a basis de- 
fined only by the information at hand: the unweighted simulation and the function that maps 
ensembles onto observables. The advantage of working in this basis are threefold. First, in 
BELT, one estimates a single parameter (ctj) for each experimental observable. If the number 
of experiments is small, as is often the case, the inference problem involves only a few pa- 
rameters. Second, the predicted observables are a natural basis for biophysical calculations, 
in that the predicted observables are the fundamental connection between simulation and 
experiment. Working in this basis allows direct connection to experiment and often provides 
insight into the molecular interactions driving biophysical phenomena. For example, the pro- 
jection onto observables could be used to rationally infer force field parameters — essentially 
a Bayesian version of the ForceBalance method (54, 55). Third, BELT does not require 
subjective choices. In the limit of exact measurements, BELT reduces to a previous (15) 
maximum entropy approach, and, more generally, is connected to the Crooks hyperensemble 
formalism (see Appx. SI). 

We also point out some surprising differences between BELT and BW-like methods. BW- 
like methods have the property that the in-state means of features are preserved, leading to 
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Figure 4: Ramachandran plots illustrate discordance of raw MD ensembles and final agree- 
ment of BELT (maxent prior) ensembles over the five tested force fields. Results from 
alternative BELT priors are shown in Fig. SI. The jagged appearance of the ff99 BELT 
model is due to limited sampling of PPII configurations in that forcefield. 
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an undesirable dependence on the choice of state decomposition. More precisely, suppose 
that Xs{x) is the indicator function of a conformational state s. Then in-state averages of 
the form < Xs{x) >~^< h{x)xs{x) > do not depend on the reweighted populations. BELT, 
however, does not preserve the in-state averages; in fact, this property is the direct result of 
belt's connection to maximum entropy modeling (see Appx. SI and ref. (15)). The effect 
of this property is that the peaks of reweighted histograms are slightly shifted relative to 
the raw MD results, as observed in Fig. S4. We conclude that BW-like methods are useful 
for systems with few, well-defined conformational states, while BELT may offer significant 
advantages in the absence of an obvious state decomposition. 

In addition to BW-like methods, there are also a class of methods where restrained simu- 
lations are used to derive ensembles of hundreds of conformations that, when taken together, 
produce the correct ensemble average observables (5, 56). Through the use of restraints, such 
methods have advantages in situations where the initial force field is insufficiently accurate 
to sample the correct regions of conformation space. Unlike BELT, however, these methods 
do not yet give a statistical treatment of uncertainty from errors in experiments or connect- 
ing simulations to experiments; new predictions cannot be rigorously falsified or validated 
in subsequent experiments. 

More recently, a similar Bayesian technique for structural ensemble inference has been 
developed (57). 

Comparison to a Previous Trialanine Study 

Our results are in qualitative, but not quantitative, agreement with a previous study of 
trialanine (19) using the same experimental measurements. That study suggested a PPn 
population as high as 92 ± 5%, somewhat higher than our 67 ± 9% and with a twofold lower 
estimated uncertainty. The difference can be attributed to three methodological differences. 
First, the previous study used likelihood maximization to directly fit the {PPjj, 13, and a) 
populations from a three-state decomposition of their simulations. The use of likelihood 
maximization may give misleading results when the likelihood surface is broad and shallowly 
peaked, as was found in the previous study. However, this does not appear to be the primary 
cause of disagreement, as maximization of the BELT likelihood recovers populations within 
±5% of the values obtained via MCMC sampling. Second, the previous study assumed each 
scalar coupling to have an uncertainty of 1, while we approximate the uncertainties as the 
RMS errors determined when fitting the Karplus equations. This weights the measurements 
differently and will lead to quantitative differences in estimated populations. Different choices 
of Karplus coefficients also may lead to different predicted properties, as has been discussed 
elsewhere (9, 58). Finally, the prior method's choice of state decomposition may cause slight 
differences in estimated conformational populations. 

Performance and Extension to Larger Systems 

The computational performance of BELT depends on several factors. First and foremost, 
the cost is proportional to the number of requested MCMC samples [usampies)- The required 
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number of samples must be determined by convergence analysis of the resulting MCMC 
traces. Second, each step of MCMC sampling requires calculations involving each member 
of the m conformations in the ensemble; m is the second major determinant of computational 
cost. Finally, each step of MCMC sampling involves drawing one random variable for each of 
the n experimental measurements, so the cost of each MCMC step depends (albeit weakly) 
on n. For the present work (m = 3 x 10^, n = 6, Ugampies = 10^), each BELT run required 
approximately 2 days of compute time on an Intel 3770K processor. A similar calculation 
with only a single experimental observable (n = 1) would take approximately 1.8 days. For 
a larger system, say ubiquitin with one measurement per residue, one might work with fewer 
conformations to reduce the computational cost. As an example of the computational cost, 
a calculation with (m = 5 x 10^, n = 76, Usampies = 10'') would require approximately 1 day. 

Because the present analysis has focused on the analysis of a small peptide, we briefly 
discuss two possible challenges in applying BELT to larger protein systems. First, the com- 
putational cost of molecular dynamics simulations currently prevents converged equilibrium 
simulations of full protein systems; this was one motivation for our choice of trialanine as 
a model system. Second, inaccurate force fields may reduce the overlap between the true 
ensemble and that sampled in simulation. Given a finite simulation length, it is possible that 
no amount of reweighting could provide agreement with experiment. Force field inaccuracy 
may become increasingly important for larger protein systems (59). 

Conclusion 

Bayesian Energy Landscape Tilting allows the simultaneous characterization of structural 
and equilibrium properties by generating a Bayesian ensemble of conformational ensembles — 
a hyperensemble. Through its use of MCMC, BELT is robust to ambiguous experiments and 
provides rigorous uncertainty estimates, as illustrated here in the case of a tripeptide system 
with a complex ensemble. BELT models constructed with a handful of NMR measurements 
correct significant force field bias, provide generalizable, force field independent trialanine 
ensembles, and allow evaluation of residual systematic errors. Important frontiers for BELT 
include the integration of numerous rather than sparse data and extension of the current 
equilibrium framework to prediction of kinetic properties. The principled combination of 
simulation and experiment — and evaluation of convergence from multiple force fields — will 
enable predictive models that might not be achievable using either simulation or experiment 
alone. 
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Table 1: Predicted and measured observables are given. BELT predictions are calculated 
using the maxent prior; see Tables S1-S7 for complete table. The 'all', 'training', and 'test' 
datasets have 10, 6, and 4 measurements, respectively. 
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